Under the supervision of Blaise Hanczar, Farida Zehraoui and Franck Augé
2024-06-12
How to reduce the number of parameters ?
How to compute interaction specific to each patient ?
How to pick relevant information from input data \(X = [x_i]_{1 \leq i \leq L}\) ?
How to apply self-attention to large large vectors ?
Group related features together and apply self-attention on it.
Grouping Strategies
\[ \begin{align} X_G &= \mathcal{T}\left(X\right) \\ &= \left[X_{g_1}, \cdots, X_{g_4}\right] \end{align} \]
Intra-group interactions
Gene grouping creates unwanted restrictions
Genes in group \(g_1\) cannot interact with genes from other groups
Restore group interactions with the Attention mechanism.
\[ X'_{g_i} = \operatorname{FCN}\left(X_{g_i}\right) \]
\[ X'_G = \left[X'_{g_1}, \cdots, X'_{g_4}\right] \]
Inter-groups interactions
New representation of each group taking into consideration features from other groups
\[ \begin{align} \color{query}q_i &= \color{query}X'_{g_i} \cdot W_q^h \\ \color{key}k_i &= \color{key}X'_{g_i} \cdot W_k^h \\ \color{value}v_i &=\color{value} X'_{g_i} \cdot W_v^h \end{align} \]
\[ \begin{align} Z &= \operatorname{MultiHeadAttention}\left(X'_G\right) \\ &= \operatorname{concat}\left(\left[h_1, \cdots, h_H \right]\right) \\ h_i &= \operatorname{Attention}\left({\color{query}Q},{\color{key}K},{\color{value}V}\right) \end{align} \]
:::
:::
::::
Across cancers different interactions are learnt
Identified pathways:
Identified interactions:
Omics were analyzed individually but a phenotype results from their interaction
Combine the different omics in a single model.
Attention mechansim can capture interaction between two vectors
\[\begin{align} Z_{\beta \rightarrow \alpha} &= \operatorname{CrossAtt}_{\beta \rightarrow \alpha}\left(X_{\alpha}, X_{\beta} \right) \\ &= \operatorname{Attention}\left(Q_{\alpha},K_{\beta},V_{\beta} \right) \end{align}\]
Consider all modality pairs: \(n^2\) pairs to consider
Only consider pairs known to interact
Layer-wise relevance propagation
Patient can be diagnosed efficiently but what are the disease drivers? How to treat the patient?
What are the important genes or potential biomarkers?
Whate are the actions that could lead a patient to a healthier state?
Counterfactuals
How would \(x\) change if \(y\) had been \(y^{\prime}\)?
\(y\) was predicted because input \(x\) had values \(\left(x_{1}, x_{2}, x_{3}, \ldots\right)\). If \(x\) instead had values \(x_{1}^{\prime}\) and \(x_{2}^{\prime}\) while other variables, \(\left(x_{3}, \ldots\right)\), had remain the same, \(y^{\prime}\) would have been predicted.
(Wachter et al., 2017)
\[ \operatorname*{argmin}_{x^{\text{CF}}} \mathcal{L}\left(g\left(x^{\text{CF}}\right), y^{\text{CF}} \right) + d\left(x^{\text{CF}}, x\right) \]
Is it sufficient to have realistic and actionnable points?
Data manifold closeness = respect the original data distribution
GANs captures the data distribution (Goodfellow et al., 2014)
\[ \mathcal{L} = {\color{manifold}\mathbb{E}_{x\sim p_{d}}\left[ D\left(x\right)\right] - \mathbb{E}_{x^{\text{CF}}\sim p_{g}}\left[ D\left(x^{\text{CF}}\right)\right] + \lambda \mathbb{E}_{\tilde{x}\sim p_{g}}\left[ {\left( {\left\|\nabla_{\tilde{x}}D\left(\tilde{x}\right) \right\|}_{2} -1 \right)}^{2}\right]} \\ + {\color{cf}\mathcal{L}_{\operatorname{Cl}} + \mathcal{L}_{\operatorname{Cl}_{T}}} + {\color{sparsity}\mathcal{L}_{\text{Reg}}\left(G\right)} \]
| \(L_1\) | 2440 |
| \(L_2\) | 30 |
| \(L_{\infty}\) | 1 |
| \(L_0\) | 0.52 |
| \(\mathcal{A}_{\text{kNN}}\) | 0 |
| \(\mathcal{A}_{\text{Oracle}}\) | 0.94 |
GDA: gene-disease association from DisGenet / COSMIC: Catalogue of somatic mutations in cancer
How to compute attention between scalar values ?
\[ A_{ij} = \operatorname{softmin}\left(\left|Q_{i} - K_{j} \right| \right) \]
Efficient implementation with Triton.
Knowledge is incomplete or may contains errors. How to handle this ?
Knowledge is iteratively constructed and omics measurement represent a mean of all occuring pathwyays.
What to do about unnannotated features ?
Thank you
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| No Gate | 0.980 ± 0.001 | 0.982 ± 0.002 | 0.979 ± 0.002 | 0.980 ± 0.002 |
| Gate | 0.987 ± 0.001 | 0.989 ± 0.001 | 0.987 ± 0.001 | 0.987 ± 0.001 |
PhD defense - Aurélien BEAUDE